simple question
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (19 more...)
- Personal (0.93)
- Research Report > New Finding (0.45)
- Research Report > Promising Solution (0.34)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (3 more...)
One simple question can stop a deepfake scammer immediately
When you purchase through links in our articles, we may earn a small commission. An expert give an easy tip on how to spot this kind of fraudster. Whenever I speak with security experts (particularly those who work on software designed to protect consumers), I always like to ask what their top advice is to combat the latest threats. So, when I had the opportunity to chat with Steve Grobman, chief technology officer at McAfee, I picked his brain about deepfake audio and video scams. Not only are scammers focusing their efforts on everyday people who never suspect they could be targeted, but the real-time impersonations of voices and whole likenesses during calls keep getting ever-more convincing.
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Games > Computer Games (0.79)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (19 more...)
- Personal (0.93)
- Research Report > New Finding (0.45)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (3 more...)
Parents rejoice! ChatGPT has a new 'Study Mode' that will force students to work through questions step-by-step instead of just getting an answer
An example of how'study mode' would work. Experts say it is'especially useful' for homework help, test prep and learning new topics It also features knowledge checks in the form of quizzes and open–ended questions, along with personalised feedback. The mode can also easy be toggled on and off during a conversation. Those wanting to use it should select'Study and learn' from tools in ChatGPT. 'Instead of doing the work for them, study mode encourages students to think critically about their learning', Robbie Torney, senior director of AI Programs at Common Sense Media said.
- Europe > United Kingdom > Wales (0.05)
- Europe > United Kingdom > Scotland (0.05)
- Europe > United Kingdom > England (0.05)
Life-Cycle Routing Vulnerabilities of LLM Router
Lin, Qiqi, Ji, Xiaoyang, Zhai, Shengfang, Shen, Qingni, Zhang, Zhi, Fang, Yuejian, Gao, Yansong
Large language models (LLMs) have achieved remarkable success in natural language processing, yet their performance and computational costs vary significantly. LLM routers play a crucial role in dynamically balancing these tradeoffs. While previous studies have primarily focused on routing efficiency, security vulnerabilities throughout the entire LLM router life cycle, from training to inference, remain largely unexplored. In this paper, we present a comprehensive investigation into the life-cycle routing vulnerabilities of LLM routers. We evaluate both white-box and black-box adversarial robustness, as well as backdoor robustness, across several representative routing models under extensive experimental settings. Our experiments uncover several key findings: 1) Mainstream DNN-based routers tend to exhibit the weakest adversarial and backdoor robustness, largely due to their strong feature extraction capabilities that amplify vulnerabilities during both training and inference; 2) Training-free routers demonstrate the strongest robustness across different attack types, benefiting from the absence of learnable parameters that can be manipulated. These findings highlight critical security risks spanning the entire life cycle of LLM routers and provide insights for developing more robust models. In recent years, large language models (LLMs) such as GPT-3.5 (Brown et al., 2020), GPT-4 (Achiam et al., 2023), and PaLM 2 (Anil et al., 2023) have achieved significant progress in natural language processing tasks, finding widespread applications in open-domain dialogue, question answering, code generation, and other tasks (Gu, 2023; Zhuang et al., 2023; Ghosh et al., 2024). However, different LLMs vary in terms of training data, model size, and computational cost, leading to differences in their strengths, weaknesses, and overall capabilities. Generally, larger models tend to exhibit stronger performance but come with higher inference costs, whereas smaller models are more computationally efficient but have limited capability in handling complex tasks. LLM Routing (Ding et al., 2024; Ong et al., 2024; Hu et al., 2024) is a state-of-the-art optimization strategy designed to mitigate this trade-off and achieve a balance between response quality and computational cost.
- Oceania > Australia > Western Australia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?
Zhang, Zongmeng, Zhu, Jinhua, Zhou, Wengang, Qi, Xiang, Zhang, Peng, Li, Houqiang
Dense retrieval, which aims to encode the semantic information of arbitrary text into dense vector representations or embeddings, has emerged as an effective and efficient paradigm for text retrieval, consequently becoming an essential component in various natural language processing systems. These systems typically focus on optimizing the embedding space by attending to the relevance of text pairs, while overlooking the Boolean logic inherent in language, which may not be captured by current training objectives. In this work, we first investigate whether current retrieval systems can comprehend the Boolean logic implied in language. To answer this question, we formulate the task of Boolean Dense Retrieval and collect a benchmark dataset, BoolQuestions, which covers complex queries containing basic Boolean logic and corresponding annotated passages. Through extensive experimental results on the proposed task and benchmark dataset, we draw the conclusion that current dense retrieval systems do not fully understand Boolean logic in language, and there is a long way to go to improve our dense retrieval systems. Furthermore, to promote further research on enhancing the understanding of Boolean logic for language models, we explore Boolean operation on decomposed query and propose a contrastive continual training method that serves as a strong baseline for the research community.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- (10 more...)
AIs get worse at answering simple questions as they get bigger
Large language models (LLMs) seem to get less reliable at answering simple questions when they get bigger and learn from human feedback. AI developers try to improve the power of LLMs in two main ways: scaling up – giving them more training data and more computational power – and shaping up, or fine-tuning them in response to human feedback. How does ChatGPT work and do AI-powered chatbots "think" like us? José Hernández-Orallo at the Polytechnic University of Valencia, Spain, and his colleagues examined the performance of LLMs as they scaled up and shaped up. They looked at OpenAI's GPT series of chatbots, Meta's LLaMA AI models, and BLOOM, developed by a group of researchers called BigScience. The researchers tested the AIs by posing five types of task: arithmetic problems, solving anagrams, geographical questions, scientific challenges and pulling out information from disorganised lists.
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.26)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.06)
Konstruktor: A Strong Baseline for Simple Knowledge Graph Question Answering
Lysyuk, Maria, Salnikov, Mikhail, Braslavski, Pavel, Panchenko, Alexander
While being one of the most popular question types, simple questions such as "Who is the author of Cinderella?", are still not completely solved. Surprisingly, even the most powerful modern Large Language Models are prone to errors when dealing with such questions, especially when dealing with rare entities. At the same time, as an answer may be one hop away from the question entity, one can try to develop a method that uses structured knowledge graphs (KGs) to answer such questions. In this paper, we introduce Konstruktor - an efficient and robust approach that breaks down the problem into three steps: (i) entity extraction and entity linking, (ii) relation prediction, and (iii) querying the knowledge graph. Our approach integrates language models and knowledge graphs, exploiting the power of the former and the interpretability of the latter. We experiment with two named entity recognition and entity linking methods and several relation detection techniques. We show that for relation detection, the most challenging step of the workflow, a combination of relation classification/generation and ranking outperforms other methods. We report Konstruktor's strong results on four datasets.
- Europe > Austria > Vienna (0.14)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > Russia (0.04)
- (8 more...)
- Workflow (1.00)
- Research Report > New Finding (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models
Liang, Mengfei, Arun, Archish, Wu, Zekun, Munoz, Cristian, Lutch, Jonathan, Kazim, Emre, Koshiyama, Adriano, Treleaven, Philip
Hallucination, the generation of factually incorrect content, is a growing challenge in Large Language Models (LLMs). Existing detection and mitigation methods are often isolated and insufficient for domain-specific needs, lacking a standardized pipeline. This paper introduces THaMES (Tool for Hallucination Mitigations and EvaluationS), an integrated framework and library addressing this gap. THaMES offers an end-to-end solution for evaluating and mitigating hallucinations in LLMs, featuring automated test set generation, multifaceted benchmarking, and adaptable mitigation strategies. It automates test set creation from any corpus, ensuring high data quality, diversity, and cost-efficiency through techniques like batch processing, weighted sampling, and counterfactual validation. THaMES assesses a model's ability to detect and reduce hallucinations across various tasks, including text generation and binary classification, applying optimal mitigation strategies like In-Context Learning (ICL), Retrieval Augmented Generation (RAG), and Parameter-Efficient Fine-tuning (PEFT). Evaluations of state-of-the-art LLMs using a knowledge base of academic papers, political news, and Wikipedia reveal that commercial models like GPT-4o benefit more from RAG than ICL, while open-weight models like Llama-3.1-8B-Instruct and Mistral-Nemo gain more from ICL. Additionally, PEFT significantly enhances the performance of Llama-3.1-8B-Instruct in both evaluation tasks.
- Europe > France (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Singapore (0.04)
Unraveling the Truth: Do LLMs really Understand Charts? A Deep Dive into Consistency and Robustness
Mukhopadhyay, Srija, Qidwai, Adnan, Garimella, Aparna, Ramu, Pritika, Gupta, Vivek, Roth, Dan
Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.
- North America > Mexico (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Pennsylvania (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)